1st International Workshop on Web Dynamics

نویسندگان

  • Hiroshi Ishikawa
  • Mario Cannataro
  • Alfredo Cuzzocrea
  • Soumen Chakrabarti
چکیده

After crawling and keyword indexing, the next wave that has made a significant impact on Web search is topic distillation: analyzing properties of the hyperlink graph for enhanced ranking of Web pages in response to a query. Hyperlink induced topic search (HITS) and PageRank (used in Google) are two examples. The linear algebra involved in HITS and PageRank is standard, but selecting the relevant subgraph of the Web to which these algorithms should be applied is considerably less clear. PageRank was intended for the entire Web graph (or as much as a crawler can collect) whereas HITS used keyword match followed by a distance-one graph expansion to determine the relevant subgraph. The clean graph model used in HITS and PageRank, where pages are nodes with no finer characteristics other than a few scalar popularity scores, is also in question. Pages have valuable markup structure and accompanying text. Moreover, the `hubs' or resource lists that make HITS so successful are often `mixed', meaning only specific regions in those pages are relevant to the query. In this talk we will discuss two enhancements to the graph selection process. First we will describe a learning system called a ``focused crawler'' which discovers and collects large relevant graphs useful for enhanced topic distillation, starting with a few relevant examples and without crawling the Web at large. Second we will discuss a fine-grained model for `micro-hubs' and new algorithms based on the Minimum Description Length principle which let us identify regions in mixed hubs which are relevant to a query, which enhances both topic distillation as well as information extraction. We will justify, using analyses and anecdotes, that as the Web evolves from static files to dynamically generated semi-structured content, these more complex models and algorithms will become crucial to the continued success of automatic resource discovery, extraction, and annotation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Building Technology - Enhanced Learning Solutions for Communities of Practice i

I am very pleased to introduce this special issue of International Journal of Webbased Learning and Teaching Technologies, edited by Nikos Karacapilidis, from the University of Patras, Greece. The articles included are an enhanced version of selected communications that were first presented at the TEL-CoPs’06 the 1st International Workshop on Building Technology Enhanced Learning solutions for ...

متن کامل

Integrating heterogeneous web service styles with flex- ible semantic web services groundings

Integrating heterogeneous web service styles with flexible semantic web services groundings Conference Item How to cite: Lambert, David; Benn, Neil and Domingue, John (2010). Integrating heterogeneous web service styles with flexible semantic web services groundings. In: 1st International Future Enterprise Systems Workshop (FES2010) at The 3rd Future Internet Symposium (FIS2010), 20-22 Sept 201...

متن کامل

1st International Workshop on E-mails in E-commerce and Enterprise Context (E3C)

The 1st International Workshop on Email in e-Commerce and Enterprise Contexts (E3C) was held on July 20th 2009 at the 11th IEEE Conference on Commerce and Enterprise Computing (CEC 2009) in Vienna, Austria. The E3C workshop brought together email and enterprise computing researchers and practitioners to present recent email research, software prototypes and to discuss the role and potential of ...

متن کامل

IS 2009 - PC Co-chairs' Message

On behalf of the Program Committee of the 1st International Workshop on Information Security (IS 2006), it was our great pleasure to welcome the participants to IS 2006, held in conjunction with OnTheMove Federated Conferences (OTM 2006), from October 30 to November 1, 2006, in Montpellier, France. In recent years, significant advances in information security have been made throughout the world...

متن کامل

1st International Workshop on: 'Designing for Participatory Learning' Building from Open Source Success to Develop Free Ways to Share and Learn

The Open Source world shows how volunteer collaboration can lead to great products and to great learning. We want to further explore at this workshop what happens using approaches from that community to break barriers between teachers and learners for today's Internet-savvy young people to design and co-construct sites for participatory learning. The aim of this workshop is to explore the barri...

متن کامل

Combining the Best of Both Worlds: A Semantic Web Book Mashup as a Linked Data Service Over CMS Infrastructure

syntax. W3C recommendation. W3C. Retrieved from http://www.w3. org/TR/2004/REC-rdf-concepts-20040210/ Ngonga Ngomo, A.-C., Heino, N., Lyko, K., Speck, R., & Kaltenböck, M. (2011). SCMS—Semantifying content management systems. In Proceedings of the 10th International Semantic Web Cnference (ISWC). Bonn, Germany, October 23– 27, Part II, 189–204. Noppens, O., Luther, M., & Liebig, T. (2010). The ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007